Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis
نویسندگان
چکیده
In Thai language, stress is an important prosodic feature that not only affects naturalness but also has a crucial role in meaning of phrase-level utterance. It is seen that a speech synthesis model that is trained with lack of stress and phrase-level information causes incorrect tones and ambiguity in meaning of synthetic speech. Our previous work has shown that manually annotated stress information improves naturalness of synthetic speech. However, a high time consumption is a drawback of the manual annotation. In this paper, we utilize an unsupervised learning technique called Bayesian Gaussian process latent variable model (Bayesian GP-LVM) to automatically put stress annotation on the given training data. Stress related features are projected onto a latent space in which syllables are easier classified into stressed/unstressed classes. We use the stressed/unstressed information as an additional context in GPR-based speech synthesis. Experimental results show that the proposed technique improves naturalness of synthetic speech as well as accuracy of stressed/unstressed classification. Moreover, the proposed technique enables us to avoid ambiguity in meaning of synthetic speech by providing intended stress position into context label sequence to be synthesized.
منابع مشابه
Tone modeling using Gaussian process latent variable model for statistical speech synthesis
In continuous speech of Thai language, tone pronunciation is affected by several factors. One of significant factors is stress that causes a diversity of F0 contours of tone, and affects syllable durations. Our previous studies have shown that a stressed/unstressed syllable context improves tone modeling accuracy. However, the stress in Thai language is generally unknown for a given input text ...
متن کاملSpeech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model
In this work, synthesis of facial animation is done by modelling the mapping between facial motion and speech using the shared Gaussian process latent variable model. Both data are processed separately and subsequently coupled together to yield a shared latent space. This method allows coarticulation to be modelled by having a dynamical model on the latent space. Synthesis of novel animation is...
متن کاملGaussian Process Latent Random Field
The Gaussian process latent variable model (GPLVM) is an unsupervised probabilistic model for nonlinear dimensionality reduction. A supervised extension, called discriminative GPLVM (DGPLVM), incorporates supervisory information into GPLVM to enhance the classification performance. However, its limitation of the latent space dimensionality to at most C − 1 (C is the number of classes) leads to ...
متن کاملStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Gaussian process latent variable models (GPLVMs) are a probabilistic approach to modelling data that employs Gaussian process mapping from latent variables to observations. This paper revisits a recently proposed variational inference technique for GPLVMs and methodologically analyses the optimality and different parameterisations of the variational approximation. We investigate a structured va...
متن کاملGaussian Process Latent Variable Alignment Learning
We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Learning alignments is an ill-constrained problem as there are many different ways of defining a good alignment. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. We derive a probabilistic model built on non-para...
متن کامل